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IMAGE RECOGNITION 

This invention relates to the recognition of images, and is concerned, 
particularly although not exclusively, with the recognition of natural images. 

By "natural image" is meant an image of an object that occurs naturally - 
5 for example, an optical image such as a photograph, as well as images of other 
wavelengths - such as x-ray and infra-red, by way of example. The natural image 
may be recorded and/or subsequently processed by digital means, but is in contrast 
to an image - or image data - that is generated or synthesised by computer or other 
artificial means. 

10 The recognition of natural images can be desirable for many reasons. For 

example, distinctive landscapes and buildings can be recognised, to assist in the 
identification of geographical locations. The recognition of human faces can be 
useful for identification and security purposes. The recognition of valuable animals . 
such as racehorses may be very useful for identification purposes. 

15 In this specification, we present in preferred embodiments of the invention 

a new approach to face recognition 

Preferred embodiments of the present invention may be combined with 
techniques disclosed in our pending application GB0323662.7, a copy of the 
specification and drawings of which is attached, and to which the reader's attention 
20 is directed 

Previous work [1,2,3,4,5,6,7,8] has shown that the use of 3D face models 
is able to overcome some of the problems associated with 2D face recognition. 

* * 

Firstly, by relying on geometric shape, rather than colour and texture information, 
systems become invariant to lighting conditions. Secondly, the ability to rotate a 
25 facial structure in three-dimensional space, allowing for compensation of 
variations in pose, aids those methods requiring alignment prior to recognition. 



t 
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Finally, the additional discriminatory information available in the facial surface 
structure, not available from two-dimensional images, provides additional cues for 
recognition. 

It has also been shown that the use of pre-processing techniques applied 
5 prior to training and recognition, in which distinguishing features are made more 
explicit, environmental effects are norma l ised and noise content is reduced, can 
significantly improve recognition accuracy [1, 9, 10]. However, the focus of 
previous research has been on identifying the optimum surface representation, 
with little regard for the advantages offered by each individual surface 

10 representation. We suggest that different surface representations may be 
specifically suited to different capture conditions or certain facial characteristics, 
despite having a general weakness for overall recognition. For example, curvature 
representations may aid recognition by making the system more robust to 
inaccuracies in 3D orientation, yet be highly sensitive to noise. Another 

15 representation may enhance nose shape, but loose the relative positions of facial 
features. The benefit of using multiple eigenspaces has previously been examined 
by Pentland et al [11], in which specialist eigenspaces were constructed for various 
facial orientations and local facial regions, from which cumulative match scores 
were able to reduce error rates. Our approach differs in that we extract and 

20 combine individual dimensions, creating a single unified surface space. This 
approach has been shown to work effectively when applied to two-dimensional 
images by Heseltine et al [12]. 

Here we analyse and evaluate a range of 3D face recognition systems, 
each utilising a different surface representation of the facial structure, in an 
25 attempt to identify and isolate the advantages offered by each representation. 
Focusing on the fishersurface method of face recognition, we propose a means of 
selecting and extracting components from the surface subspace produced by each 
system, such that they may be combined into a unified surface space. 



Prior to training and testing, 3D face models are converted into one of the 
following surface representations. This is done by firstly orientating the 3D face 
model to face directly forwards, then projecting into a depth map. The surfaces in 
the table below are then derived by pre-processing of depth maps. 



I Horizontal Derivative I Vertical Derivative 





-1 1 




-1 
1 


1 Applies the 2x1 j 
1 kernel to compute the 
I horizontal derivative 


Applies the 1x2 kernel 
to compute the vertical 
| derivative 


1 Laolacian 


1 Sobel X 




0 1 0 
1-4 1 
0 1 0 




-1 0 1 
-2 0 2 
-1 0 1 


1 An isotropic measure 
1 of the second spatial 
1 derivative 


| Application of the 
j horizontal sobel 
1 derivative filter 
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Horizontal gradient 

over a greater 
horizontal distance 

Sobel Y 



I Vertical gradient over] 
a greater vertical 
distance 

Snhel Magnitude 



1 2 1 
0 0 0 
-1 -2 -1 



Horizontal Curvature 



Vertical Curvature 



(Applies sobel X twicel ~ ^ y 

lto calculate the second 
vertical derivative 



Application of the 
vertical sobel 
derivative filter 



The magnitude of 
Sobel X and Y 
combined. 

Curve Type 



to calculate the 
second horizontal 
derivative 



jyTin CurvatuTe 



Max Curvature 



The iriinimurnof the I The maximum of the 
torizontal and verticallhorizontal and vertical 
curvature values 1 curvature values 



The magnitude of the I Segmentation of the 
vertical and horizontal I surface into 8 discreet] 
curvatures [| curvature types 

flhs Min Cu rvature I Abs Max Curvafare. 



The minimum of the I The maximum of the 
absolute horizontal I absolute horizontal 
and vertical curvatures land vertical curvature! 



We give here a brief explanation of the fisherface method of face 
recognition, as described by Belhumeur et al [13] and how it is applied to three- 
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dimensional face surfaces, termed the fishersurface method. We apply both 
principal component analysis and linear discriminant analysis to surface 
representations of 3D face models, producing a subspace projection matrix, 
similar to that used in the eigenface [11] and eigensurface [1] methods. However, 

5 the fishersurface method is able to take advantage of 'within-class' information, 
minimising variation between multiple face models of the same person, yet still 
maximising class separation. To accomplish this, we expand the training set to 
contain multiple examples of each subject, describing the variance of a person's 
face structure (due to influences such as facial expression and head orientation), 

10 from one face model to another, as shown in equation 1. 



Training Set = (T u Ts, T 4 , r 5 , r& T 7 , r& r^I^ra, ^n^-Thff) 0-) 

Xi X2 X3 X4 Xc 



• Where T\ is a facial surface and die training set is partitioned into c classes, 
such that each surface in each class Xi is of the same person and no single person 
is present in more than one class. We continue by computing three scatter 
15 matrices, representing the within-class between-class (Sb) and total (St) 

distribution of the training set throughout surface space, shown in equation 2. 



Where y^jrr is ^ e average surface of the entire training set, and 

% a TxxH r i9 ^ e avera g e °f class Xi. By performing PCA using die total scatter 

20 matrix St, and taking the top M-c principal components, we produce a projection 
rnatmr TJ pceh used to reduce dimensionality of the within-class scatter matrix, 
ensuring it is non-singular before computing the top c-1 (in this case 49) 
eigenvectors of the reduced scatter matrix ratio, Ujid as shown in equation 3. 
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argmax 
u 



(?) 



Finally, the matrix U ff is calculated as shown in equation 4, such that it 
may project a face surface into a reduced surface space of c-1 dimensions, in which 
the between-class scatter is maximised for all c classes, while the within-class 
5 scatter is minimised for each class Xi. 



U ff = U fid U fca 



(4) 



Once the matrix Uff has been constructed it is used in much the same way 
as the projection matrix in the eigenface and eigensurface systems, reducing 
dimensionality of face surface vectors from 5330 to just 49 (c-1) elements. Again, 
10 like the eigenface system, the components of the projection matrix can be viewed 
as images, as shown in figure X for the depth map surface space. ^ 

* 

Once surface space has been defined, we project a facial surface into 
surface space by a simple matrix multiplication using the matrix Uff, as shown in 
equation 5. 

<o k =ul<r-V) fork = l ...c-1 . (5) 
15 where u k is the kth eigenvector and w k is the kth weight in the vector Q T 

= [cai, co2, o>3, ... tor]. The c-1 coefficients represent the contribution of each 
respective fishersurface to the original facial surface structure. The vector Q is 
taken as the 'face-key' representing a person's facial structure in reduced 
dimensionality surface space and compared using either eudidean or cosine 
20 . distance metrics as shown in equation 6. 



An acceptance (the two facial surfaces match) or rejection (the two 
surfaces do not match) is determined by applying a threshold to the distance 
calculated- Any comparison producing a distance value below the threshold is 
considered an acceptance. 

Here we analyse the surface spaces produced when various facial surface 
representations are used with the fishersurface method. We begin by providing 
results showing the range of error rates produced when using various surface 
representations. The figure below clearly shows that the choice of surface 
representation has a significant effect on the effectiveness of the fishersurface 
method, with horizontal gradient representations providing the lowest equal error 
rates (EER, the error when FAR equals ERR). 




Surface Representation 



However, the superiority of the horizontal gradient representations does 
not suggest that the vertical gradient and curvature representations are of no use 
whatsoever and although the discriminatory information provided by these 
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10 



15 



representations may not be as robust and &stinguishing, that is not to say they 
wouldn't make a positive contribution to the information already available in the 
horizontal gradient representations. We now carry out further investigation into 
the discriminating ability of each surface space by applying Fisher's linear 
Discriminant (ELD), as used by Gordon [3] to analyse 3D face features, to 
individual components (single dimensions) of each surface space. Focusing on a 
single face space dimension we calculate the discriminant d, describing the 
discriminating power of that dimension, between c people. 



d=-r* ; 



Where m is the mean value of that dimension in the face-keys, m the 
within-class mean of class i and 0; the set of vector elements taken from the face- 
keys of class i. Applying the above equation to the assortment of surface space 
systems generated using each facial surface representation, we see a wide range of 
discriminant values describing the distinguishing ability of each individual 
dimension, as shown below for the top ten most discriminating dimensions for 
each surface representation. 
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It is clear that although some surface representations do not perform well 
in the face recognition tests, producing high EERs (for example min_curvature), 
5 some of their face-key components do contain highly discriminatory information. 
We hypothesise that the reason for these highly discriminating anomalies, in an 
otherwise ineffective subspace, is that a certain surface representation may be 
* particularly suited to a single discriminating factor, such as nose shape or jaw 
structure, but is not effective when used as a more general classifier. Therefore, if 
10 we were able to isolate these few useful qualities from the more specialised 
subspaces, they could be used to make a positive contribution to a generally more 
effective surface space, reducing error rates further. 

Here we describe how the analysis methods discussed in above are used to 
combine multiple face recognition systems. Firstly, we need to address the 
15 problem of prioritising surface space dimensions. Because the average magnitude 
and deviation of face-key vectors from a range of systems are likely to differ by 
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some orders of magnitude, certain dimensions will have a greater influence than 
others, even if the discriminating abilities are evenly matched. To compensate for 
this effect, we normalise moments by dividing each face-key element by its within- 
class standard deviation. However, in normalising these dimensions we have also 

5 "removed any prioritisation, such that all face space components are considered 
equal. Although not a problem when applied to a single surface space, when 
combining multiple dimensions we would ideally wish to give greater precedence 
to the more reliable components. Otherwise the situation is likely to arise when a 
large number of less discriminating (but still useful) dimensions begin to outweigh 

10 the fewer more discriminating ones, diminishing their influence on the verification 
operation and hence increasing error rates. We showed how FLD could be used 
to measure the discriminating ability of a single dimension from any given face 
space. We now apply this discriminant value d as a weighting for each face space 
dimension, prioritising those dimensions with the highest discriminating ability.' 

15 With this weighting scheme applied to all face-keys produced by each 

system, we can begin to combine dimensions into a single unified surface space. 
In order to combine multiple dimensions from a range of surface spaces, we 
require some criterion to decide which dimensions to combine. It is not enough 
to rely purely on the discriminant value itself, as this only gives us an indication of 

20 the .discriminating ability of that dimension alone, without any indication of 
whether ' the inclusion of this dimension would benefit the existing set of 
dimensions. If an existing surface space already provides a certain amount of 
discriminatory ability, it would be of little benefit (or could even be detrimental) if 
we were to introduce an additional dimension describing a feature already present 

25 within the existing set. 

Previous investigations [12] have used FLD, applied to a combined 
eigenspace in order to predict its effectiveness when used for recognition. 
Additional dimensions are then introduced if they result in a greater discriminant 
value. Such a method has been shown to produce an 2D eigenspace combination 
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able to achieve significantiy lower error rates in 2D face recognition, although 
Heseltine et al also note that using the EER would likely provide better results, 
although processing time would be extremely long. However, with a more efficient 
combination algorithm we now take that approach, such that the criterion required 
for a new dimension to be introduced to an existing surface space is a resultant 
increase in the EER. 

Combined surface space = first dimension of current optimum system 
Calculate EER of combined surface space 
For each surface space system: 

* • * ♦ 

For each dimension of surface space: 

Concatenate new dimension onto combined surface space 
Calculate EER of combined surface space 
If EER has not increased: 

Remove new dimension from combined surface space 
Save combined surface space ready for evaluation 



The figure below shows which dimensions from which surface space were 
selected using the above algorithm, for inclusion in two combined systems: one 
using the euclidean distance metric and the other using the cosine distance metric. 
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We now compare the combined surface space systems with the optimum 
individual system, using both the cosine and euclidean distance measures. 
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The error curves above show the results obtained when the optimum 
single fishersurface system and combined fishersurface system are applied to test 
set A (used to construct the combined system), test set B (the unseen test set) and 
5 the full test set (all images from sets A and B) using the cosine and euclidean 
distance metrics. We see that the combined systems (dashed lines) do produce 
lower error rates than the single systems for both the cosine and Euclidean 
distance measure. The optimum system can be seen as the fishersurface 
combination using the cosine distance, producing an EER of 7.2% 9.3% and 8.2% 
10 for test set A, B and A and B respectively. 

In one aspect, embodiments of the invention as described above apply the 
use of a fisherface method to the recognition of images, preferably natural images, 
preferably faces, and preferably human faces. 

In another aspect, one or more pre-processing methods are applied prior 

■ 

15 to use of a fisherface method. 

In another aspect, pre-processed data is combined prior to use of a 
fisherface method. 

As indicated above, methods as disclosed herein may be combined with 
advantage with those disclosed in our prior, pending application GB0323 662.7. 

20 In this specification, the verb "comprise" has its normal dictionary 

meaning, to denote non-exclusive inclusion. That is, use of the word "comprise" 
(or any of its derivatives) to include one feature or more, does not exclude the 
possibility of also including further features. 

The reader's attention is directed to all and any priority documents 
25 identified in connection with this application and to all and any papers and 
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documents which are filed concurrently with or previous to this specification in 
connection with this application and which are open to public inspection with this 
specification, and the contents of all such papers and documents are incorporated 
herein by reference. 

5 All of the features disclosed in this specification, and/ or all of the steps of 

any method or process so disclosed, may be combined in any combination, except 
combinations where at least some of such features and/or steps are mutually 
exclusive. 

» 

Each feature disclosed in this specification may be replaced by alternative 
10 features serving the same, equivalent or similar purpose, unless expressly stated, 
otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one; 
example only of a generic series of equivalent or similar features. 

The invention is not restricted to the details of the foregoing 
embodiments). The invention extends to any novel one, or any novel ^ 
15 combination, of the features disclosed in this specification, or to any novel one, or 
any novel combination, of the steps of any method or process so disclosed. 
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Fig. 1. Example face models taken from a 3D face database 
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Fig. 2. Orientation of a raw 3D face model (left) to a frontal pose (middle) and facial 

surface depth map (right) 
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Fig. 3. Average depth map (left most) and first eight eigensurfaces 
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Fig. 4. Baseline 3D face recognition systems using facial surface depth maps and a 

range of distance metrics 
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Fig. 5. Diagram of verification test procedure 
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Fig. 6. Error rates of 3D face recognition systems using optimum surface 

representations and distance metrics. 
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Fig. 7. Equal error rates of 3D face recognition systems using a variety 

of surface* representations and distance metrics 
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Figure 8. Brief descriptions of surface representations 
with the convolution kernels used. 
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